Neural Activity and Decision-Making in Mice: Analysis on Spike Train Data from Visual Cortex

Abstract

Being able to predict behavioral outcomes from neural activity is a cruical part of neuroscience research as it has many applications in understanding how decisions are made. This report analyzes how spike train data from the visual corext of mice who are performing a decision-making task, where trials can result in a sucess (feedback = 1) or failure (feedback = -1). Using data from Steinmetz et al. (2019), 18 sessions with 4 mice were picked. We investigated the effectiveness of three classification models - Logistic Regression, Linear Discriminat Analysis (LDA), and k-Nearest Neighbors (kNN) - to predict the trials outcome based on neural activity and the stimulus’ contrast levels.

Feature analysis revealed that mean firing rate was the most significant predictor across models and contract levels having mixed effects - but important nonetheless. Principal Component Analysis (PCA) suggests there are subtle differences in neural firing across session and mice but no strong clustering. Furthermore, the results of the report indicated that kNN achieved the highest accuracy (73.5%), outperforming the other two classificatio model. However, all models struggle with class imbalance due to the data having more success than failures.

Although the models did achieve a reasonable accuracy rate, there were limitations in handling the imblanace of data and complext neural activity patterns. Further analysis would work on advanced models that can be able to handle more complex algorithms - rather than just linear which was used in this report. Addressing these challenges would enhance the report and better the understanding of decision-making.

I. Introduction

Neural activity and behavioral outcomes is a critical part of neurosicence and has significant implications for research and clinical application.

In this project, we will focus on analyzing data collected in Steinmetz et al. (2019) in order to provide an analysis on decision-making tasts in mice. This analysis will focus on 18 sessions obtained from four mice (Cori, Forssman, Hence, and Lederberg). With each session consisting of hunders of traisl in which the visual stimuli had variying contrast levels are presented to them on a screen. The mice were be required to make a choice (left or right) and their decisions were classified as successes (feedback or 1) or failures (feedback of -1).

The primary objective of this report is to develop a prdictive model that can accurately determine the outcome of each trial based on the neural activity data-represented by spike trains recorded in the visual corext- and the stimuli parameters (left or right contrasts). To achieve this, the project will employ the logistic regression and k-nearest neighbors (kNN) method. Both models will be compared which will help determine which strategy for predictive behavioral outcome the best.

The project will answer two questions: which predictors are the most important to predicting the outcome of each trail and which binary classification method most accurately classifies which one.

II. Exploratory analysis

In order to create a model, we must first understand the data we will be handling. The data provided has 18 sessions and 4 mice, and 6 variables that were measured: feedback_type, contrast_left, contrast_right, time, spks, brain_area.

II.1 Data structures across sessions

Now that we are familiar with the semantics of the dataset, we will visualize it to better understand the trends already present in the data. In the table presented, we can see that each session is for a different mice and the number or traisl vary between 100 to 450 trials. It can also be seen that there is a lot of different types of neurons that were used and different numbers of neurons depending on the trial.

It is also important to know how the mice performed on the trials. It can be seen that the mice were successful 3608 times compared to their 1474 failures. This means that the mice were successful most of the time. Next, the cross-tabulation of left and right contrast levels gives further details about the stimulus conditions across sessions. It is important to note that some conditions occurred more than others, especially when there was no contrast in the left and right.

Additionally, the plot shows which parts of the brain were measured in each of the session. Some areas were measured more than others which is important to note as not all areas can be attributed to the data. This summary helps shed light on the data set that will be used for later modeling and analysis.

##    session_number mouse_name   date_exp n_trials n_neurons
## 1               1       Cori 2016-12-14      114       734
## 2               2       Cori 2016-12-17      251      1070
## 3               3       Cori 2016-12-18      228       619
## 4               4  Forssmann 2017-11-01      249      1769
## 5               5  Forssmann 2017-11-02      254      1077
## 6               6  Forssmann 2017-11-04      290      1169
## 7               7  Forssmann 2017-11-05      252       584
## 8               8      Hench 2017-06-15      250      1157
## 9               9      Hench 2017-06-16      372       788
## 10             10      Hench 2017-06-17      447      1172
## 11             11      Hench 2017-06-18      342       857
## 12             12  Lederberg 2017-12-05      340       698
## 13             13  Lederberg 2017-12-06      300       983
## 14             14  Lederberg 2017-12-07      268       756
## 15             15  Lederberg 2017-12-08      404       743
## 16             16  Lederberg 2017-12-09      280       474
## 17             17  Lederberg 2017-12-10      224       565
## 18             18  Lederberg 2017-12-11      216      1090
##                                                              neuron_types
## 1                                  ACA, MOs, LS, root, VISp, CA3, SUB, DG
## 2                                            CA1, VISl, root, VISpm, POST
## 3                  DG, VISam, MG, CA1, SPF, root, LP, MRN, POST, NB, VISp
## 4                   LGd, DG, TH, SUB, VPL, VISp, CA1, VISa, LSr, ACA, MOs
## 5                        VISa, root, CA1, SUB, DG, OLF, ORB, ACA, PL, MOs
## 6                                                 AUD, root, SSp, CA1, TH
## 7                                   VPL, root, CA3, LD, CP, EPd, SSp, PIR
## 8  ILA, TT, MOs, PL, LSr, root, LD, PO, CA3, VISa, CA1, LP, DG, VISp, SUB
## 9             TT, ORBm, PL, LSr, root, CA3, VISl, CA1, TH, VISam, VPL, LD
## 10   MB, VISp, SCm, SCsg, POST, DG, MRN, CA1, VISl, POL, root, GPe, VISrl
## 11                                            MOp, LSc, root, PT, CP, LSr
## 12             VISp, DG, SUB, LGd, PL, root, MOs, ACA, CA1, VISam, MD, LH
## 13 VISam, ZI, DG, CA1, LGd, MB, SCs, RN, MRN, SCm, ACA, PL, MS, root, MOs
## 14                     ORB, MOs, root, MRN, SCm, SCs, VISp, RSP, CA1, PAG
## 15                                  BLA, GPe, root, VPM, LGd, ZI, MB, CA3
## 16                                             SSs, SSp, MB, TH, LGd, CA3
## 17                                            root, VPL, VPM, RT, MEA, LD
## 18                           CP, ACB, OT, SI, SNr, LGd, ZI, CA3, root, TH
## 
##   -1    1 
## 1473 3608
##       
##           0 0.25  0.5    1
##   0    1371  194  326  454
##   0.25  179   99  179  317
##   0.5   397  166  111  163
##   1     438  423  159  105

II.2. Neural activity examination

Next, an analysis of neural activity was examed to better understand how the mean firing rate across trials in each session. The spike trains were accessed and their mean values were computed to represent the overall neural response during the trial. These visualizations show most of the neural firing rates were within 0.3 spiked per bin of each other. There is fluctuation that was pretty consistent across trials which could suggests there is a trial-to-trail variability; however, as they all stayed within a 0.3 difference between them, it could mean there is no long-term drift or sudden shift we should be worried about.

Additionally, there is not a very clear positive or negative trend in most of the graphs. This could suggests that there is no strong drift in the neural activity levels over time. This would mean that the animal performed consistently throughout the trials and mean that within the session, the conditions remained steady.

It is noted that each session had a different ranges in the average firing rate which could mean that the neural recording conditions and task parameters could have changed dramatically between sessions. Thus, this could affect data analysis; however, this could be expected as the sessions were not all conducted in the same day, which could lead to other variables affecting the data.

II.3. Trial-By-Trial Variability

II.4. Homogeneity vs. Heterogeneity

A Homogeneity and Heterogeneity plot helps analyze the variability and/or consistency within the dataset. In this case, it can help determine whether the neural responses are similar across mice and if there is variation based on their difference.

In the plot, it can be seen that Cori has the highest median compared to all three other mice and Forssmann has the lowest. Lederberg has a wider distribution than the others which would suggests variability in neural activity. Additionally, Lederberg also has the most outliers, thie indicate that some trials had significantly higher firing rates than others. This could affect later data analysis, thus important to note. Conversly, Forssman has the least variability and the smallest spread in its data.

Looking at the plot, it suggests there is heterogeneity in nerual activity. This means that the variation in the distributions suggesting that the activity patterns are not uniform across subjects which could impact model generalization later on.

In order to check this, ANOVA will be conducted in order to see if their firing rates vary statistically significantly between mice. With a p-value of <2e-16, it would indicate that the mean firing rate significantly differ between mice at any significant level of alpha (0.1,0.05,0.01).

Now looking at homogeneity vs heterogeneity across the session, it can be noted that there is a lot of variability. There are some sessions with a median firing rate much higher than others. This would suggests there is session to session variability which means that neural activity was not consistent across trials. This would align with what has been said as each session looked at a different mice. Some sessions have wide spread - session 2, 7, and 13 - which would mean a high variability within those sessions. In contrast, other sessions have tighter distribution - sessions 2, 6, 10, 12, and 18. It is also important to point out that session 13 has the most outliers - which might affect later analysis and modelling.

Now that we have a better understanding of the data, we can look at ways to approach the data to ensure best modelling.

##               Df Sum Sq Mean Sq F value Pr(>F)    
## mouse_name     3 0.1384 0.04615   369.7 <2e-16 ***
## Residuals   5077 0.6337 0.00012                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

III. Data integration

#####III.1.PCA Models PCA Models are made to help with visualizing high-dimensional data by projecting it into a 2D or 3D plot. In this case, we are projecting the neural data onto the first two principal components - which can be identified by the axes - in order to observe whether trials custer distinctly based on the mouse or session.

The two PCA models - one colored by mouse and one colored by session - helps provide a powerful visual and quantitative approach to data integration. First looking at the mouse-colored plot, we can see substantial overlap among data points which indicates that the largest source of variance do not clearly seperate trials by mouse. This means that all of the mouse have very similar patterns in the directions that will account for the variance. So while there may be subtle grouping, they are not distinctly seperate in the first two PCs. Similarly, when looking at the sessions, there is no clear cut session level grouping that emerges. This indicates that the session difference may be small compared to the other sources of variation. These observations imply that any session or mouse related variability might be subtle among the higher order components.

However, with these results, we can pool information across all sessions and mice in a lower dimensional space.

IV. Predictive modeling

Now that we have our training data, we can fit it to a model. The model confirms that we are predictive the categorical outcome of feedback type based on the predictors of whether the contrast is on the left or right and how the means of the trials. It shows that about 29% of the trial are classified as failure and 71% as successes. This reflects the base rate of distribution.

Three models will be used to check and see which one is best at predicting the outcomes for this particular study.

IV.1. Logisitc Regression

The first method for a predictive model will be using logitic regression. The model will be done onto the training data and predictive will be made using the 0.5 threshold. The model will be evaluated using the accuracy and confusion matrix to see which model will be best.

## 
## Call:
## glm(formula = feedback ~ contrast_left + contrast_right + mean_firing, 
##     family = "binomial", data = training_data)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     0.14781    0.09671   1.528   0.1264    
## contrast_left   0.15515    0.08016   1.936   0.0529 .  
## contrast_right -0.11918    0.08192  -1.455   0.1457    
## mean_firing    21.88578    2.71367   8.065 7.32e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6118.2  on 5080  degrees of freedom
## Residual deviance: 6044.4  on 5077  degrees of freedom
## AIC: 6052.4
## 
## Number of Fisher Scoring iterations: 4
IV.2 LDA Model

Now looking at the linear discriminant statistics (LDA) which helps seperate the classes to better intepret them. A positive coefficient is seen for when the contrast is on the left which indicates that an increase in the predictor tends to push the discriminat score in favor of the success class. In contrast, there is a negative coefficient for when the contrast is on the right, which means that the higher the value of the predictor, the more likely the score will be a failure. Additionally, the mean of the firing across trials is quite large which implies that small changes in this score can have a large impact on the classification

## Call:
## lda(feedback ~ contrast_left + contrast_right + mean_firing, 
##     data = training_data)
## 
## Prior probabilities of groups:
##        -1         1 
## 0.2899036 0.7100964 
## 
## Group means:
##    contrast_left contrast_right mean_firing
## -1     0.3217923      0.3250170  0.03207458
## 1      0.3500554      0.3237943  0.03515595
## 
## Coefficients of linear discriminants:
##                       LD1
## contrast_left   0.5877819
## contrast_right -0.4394808
## mean_firing    79.7919695
IV.3. kNN Model

The k-nearest neighbor model (kNN) is a classification method that compares a point of data with a set of data that has been trained to make predictions. For this method, we need to set the k-value - which is how many neighbors will be checked to determine the classification of specific points - which is best set at the squared root of how many datapoints there are. This was set to 68 and the model was trained.

This model determined that the minimal misclassification rate on the training data was approximately 35.05% which is high meaning there is a classification problem. It was confirmed that the best-performing model used would be when k-4, which means that the 4-nearest neighbors configuration will help minimize the classification error.

## 
## Call:
## train.kknn(formula = feedback ~ contrast_left + contrast_right +     mean_firing, data = training_data, kmax = k_value, distance = 2,     kernel = "optimal")
## 
## Type of response variable: nominal
## Minimal misclassification: 0.2910844
## Best kernel: optimal
## Best k: 68

V. Prediction performance on test sets

Now that all the models have been trained, we can try to test them on our test data to see if they give us valid responses.

V.1. Logistic Regresison Model

The model outputs shows that only contrast left being significant. Conversely, contrast right is not siginificant; however, mean firing value is highly significant with a p-value that indicates a strong predictor for feedback. The model has an accuracy of 72.5% which is high; however, the confusion matrix reveals that there are issues. In particular, the model does not predict any instances of the class when sensitivity is 0, rather only predicitve all cases as 1. As a result, the kappa statistic is 0 which indicates that there is no agreement between the variables beyond is happening based on chance. Thus this outcome suggests that there might be potential challenges to the model.

## [1] 0.725
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  -1   1
##         -1   2   2
##         1   53 143
##                                           
##                Accuracy : 0.725           
##                  95% CI : (0.6576, 0.7856)
##     No Information Rate : 0.725           
##     P-Value [Acc > NIR] : 0.5362          
##                                           
##                   Kappa : 0.0317          
##                                           
##  Mcnemar's Test P-Value : 1.562e-11       
##                                           
##             Sensitivity : 0.03636         
##             Specificity : 0.98621         
##          Pos Pred Value : 0.50000         
##          Neg Pred Value : 0.72959         
##              Prevalence : 0.27500         
##          Detection Rate : 0.01000         
##    Detection Prevalence : 0.02000         
##       Balanced Accuracy : 0.51129         
##                                           
##        'Positive' Class : -1              
## 
V.2 LDA Model prediction

The LDA model shows that the model achieved an overall 72.5% accuray, similar to the logistic regression model; however, it failed to predict any cases of the minority class which resulted in the kappa value of 0. The R

## Call:
## lda(feedback ~ contrast_left + contrast_right + mean_firing, 
##     data = training_data)
## 
## Prior probabilities of groups:
##        -1         1 
## 0.2899036 0.7100964 
## 
## Group means:
##    contrast_left contrast_right mean_firing
## -1     0.3217923      0.3250170  0.03207458
## 1      0.3500554      0.3237943  0.03515595
## 
## Coefficients of linear discriminants:
##                       LD1
## contrast_left   0.5877819
## contrast_right -0.4394808
## mean_firing    79.7919695
## [1] 0.725
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  -1   1
##         -1   0   0
##         1   55 145
##                                           
##                Accuracy : 0.725           
##                  95% CI : (0.6576, 0.7856)
##     No Information Rate : 0.725           
##     P-Value [Acc > NIR] : 0.5362          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 3.305e-13       
##                                           
##             Sensitivity : 0.000           
##             Specificity : 1.000           
##          Pos Pred Value :   NaN           
##          Neg Pred Value : 0.725           
##              Prevalence : 0.275           
##          Detection Rate : 0.000           
##    Detection Prevalence : 0.000           
##       Balanced Accuracy : 0.500           
##                                           
##        'Positive' Class : -1              
## 
V.3 kNN Model

The model achieved an overall of 73.25% accuracy on the test set. The confusion matrix indicates that out of all the cases where there is a true -1 cases, the model identified 1 and misclassified 3 This resulted in a sensitivity of only 31%. Conversly, there were 133 correct cases for a true 1 cases while there were 52 missclassification; thus the specificity is higher at 84.8%. The kappa statistic of 0.0669 indicates that the model is slightly better than random chance. The model correctly identifies 99.3% of the actual class. These results highlight the model’s difficulty in detecting the minority class which indicates that an alternative model may be necessary to improve on it’s performance.

## [1] 0.735
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  -1   1
##         -1   3   1
##         1   52 144
##                                           
##                Accuracy : 0.735           
##                  95% CI : (0.6681, 0.7948)
##     No Information Rate : 0.725           
##     P-Value [Acc > NIR] : 0.4105          
##                                           
##                   Kappa : 0.0669          
##                                           
##  Mcnemar's Test P-Value : 6.51e-12        
##                                           
##             Sensitivity : 0.05455         
##             Specificity : 0.99310         
##          Pos Pred Value : 0.75000         
##          Neg Pred Value : 0.73469         
##              Prevalence : 0.27500         
##          Detection Rate : 0.01500         
##    Detection Prevalence : 0.02000         
##       Balanced Accuracy : 0.52382         
##                                           
##        'Positive' Class : -1              
## 
V.4. ROC Comparison

Now taking a look at the ROC graph, it can be seen that kNN performs the best by this measure, even though the differences among the curves are relatively small. All 3 models performed above the line of chance but none achieves a particularly high AUC. In practical terms, this would mean that the models still faces challenges because non have a significantly high AUC value.

VI. Discussion

This study aimed to develop a predictive model for trial outcomes in mice using neural activity data and stimulus levels. Using 3 classification models - logistic regression, LDA, and kNN - were tested on spike train data which was collected from visual cortex recordings. The models then evaluated based on their accuracy, sensitivity, specificity, and predictive performance.

The key findings was that the kNN model outperformed the other two models. It achieved the highest accuracy out of all three models. LDA and logistic regression models failed to predict the occurences of the minority class which lead to a kapp statistic of 0 indicating that it could be due to random chance.

Furthermore, it is important to note that there are limitations to this analysis. The dataset was highly imbalance as there were many more success trials than not. This led to the logistic regression and LDA failing to recognize the failures. Additionally, the model only used contrast levels and mean firing rate, this led to simplistic features to the analysis. Adding more variables can improve on prediction accuracy but that would also require more complect models.

VII. Conclusion

This report is able to provide a simple analysis of predicting behavioral outcomes using neural activity. The results from this report suggests that the neural activity in the mice and stimulus contrast can partially predict behavioral decisions in mice. The strong correlation of mean firing rate with feedback highlights the importance of how neural response intensity in decision making tasks. The failure of the classification model indicates how more complex methods would be necessary for more accurate and better predictions.

References

Chen, Shizhe. “StatsDataScience Notes.” Jupyter Notebook Viewer, nbviewer.org/github/ChenShizhe/StatDataScience/blob/master/Notes/AppendixBProgramming.ipynb. Accessed 18 Jan. 2025. Steinmetz, N.A., Zatka-Haas, P., Carandini, M. et al. Distributed coding of choice, action and engagement across the mouse brain. Nature 576, 266–273 (2019). https://doi.org/10.1038/s41586-019-1787-x Accessed 18 Jan. 2025.

Conversations with ChatGPT: https://chatgpt.com/share/67d8879f-4194-8007-8de6-2cad1e41ee1d https://chatgpt.com/share/67d88bf3-2be8-8007-a7ea-dd4bd532b66e